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Abstract 

Wo describe a’ scheduler based on the microeconomic paradigm for scheduling on-line 
a set of paiallel jobs in a multiprocessor system. In addition to increasing the system 
thioughput and reducing the response time, wo consider fairness in allocating system re- 
sources among the users* and provide the user with control over the relative performances 
of his jobs. Every user has a savings account in which he receives money at a constant 
rate. lo run a job. the user creates an exjxnsf account for that job to which he transfers 
money from his savings account. The job uses the funds in its expense account to obtain 
the system resources it needs. The share of the system resources allocated to the user is 
diiectly related to the rate at which the user receives money; the rate at which the user 
tiansfeis money into a job expense account controls the job s performance. 

We prove that starvation is not possible in our model. Simulation results show that our 
schedule! improves both system and user performances in comparison with two different 
vaiiable partitioning policies. It is also effective in guaranteeing fairness and providing 
control over the performance of jobs. 


'Ibis author was supported by NSF grant ( ’( ’R-9024954, by DOE grant DE-FO05-94ER25210. and by the 
National Aeronautics and Space Administration under NASA ( ’on tract. NASI- 19480 while in residence at the 
Institute for Computer Applications in Science and Engineering (I(’ASE), MS 132(\ NASA Langley Research 
Center, Hampton VA 25081-0001. 
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1 Introduction 


We describe a microeconomic approach for scheduling on-line a set of jobs in a parallel system 
with identical processors. 1 his approach exploits the following similarity between t he scheduling 
and the lesource allocation problems in a computer system, and in a real economic system: 
Each system involves independent agents that compete for common resources in pursuing their 
goals. We adopt an open-market strategy which has proved to be successful in dealing with the 
enormous complexity of real economical environments. 

T h(> microeconomic approach has several advantages over ot her algorithms t hat have been de- 
veloped for this scheduling problem [5]. The usual formulat ions of this problem seek to maximize 
the system throughput and minimize the user response time. but. in practice there are additional 
requirements that, schedules must satisfy. The first of these is to ensure fairness in resource al- 
location among the users. A second requirement, is to give the user flexibility in controlling the 
relative share of resources allocated among his jobs. We show that both these features can be 
incorporated into the microeconomic approach in a very natural way. while this is not true of 
many of the earlier scheduling algorithms. 

Scheduling ptoblems aie usually formulated as optimization problems ol minimizing the max- 
imum completion time or the maximum lateness [1, 8. 9, IT]. Since even simplified formulations 
oi scheduling pioblems are NP-hard in general [1, 8, 17]. many sub-optimal algorithms have been 
proposed [3, 9], The more complex scheduling problem considered here is also NP-hard, and the 
microeconomic approach leads to a heurist ic algorithm for the problem. We show by simulation 
that this algorithm improves both system and user performances relative to two different variable 
partitioning policies [5]. 

The microeconomic paradigm has been applied to t he resource allocat ion problem by Miller 
and Malone from MIT, Drexler and Huberman from Xerox, and others [1, 11, lb] at the end of 
the eighties. In the last, lew years, several schedulers based on this paradigm have been proposed 
[4, 19], 1 hose schedulers use 1 the auct ion mechanism to allocate resources among compet ing users. 
At the beginning of every time-slice, the resource initiates an auction in which the interested 
users participate by bidding monetary funds that, increase over time. The client that offers t he 
highest bid acquires the resource for the next time-slice. The price per time-slice is directly 
related to the level of competition for that resource; if the competition increases, the price also 
i lie leases. In this way, as in real economic environments, the users are encouraged to maximize 
theii profit, i.e., to devote their funds to resources that are more important for them. These 
schedulers were intended more for distributed systems in which resources are allocated in an un- 
correlated manner. Therefore, these systems were suited more for coarse grained asynchronous 
parallel applications, such as Monte-Carlo simulat ions [19]. In contrast, the majority of parallel 
scientific applications are highly synchronous, in that an application requires a specified number 
of processors to be available during the same interval of time. Another problem with these 
schedulers is that holding an auction at the beginning of every time-slice incurs a high overhead. 

A mic loeconomic algorithm for balancing the load in distributed systems was suggested by 
Ieiguson ft nl. [7]. Jobs are assumed to arrive’ independently at every processor in the system. 

I pon arrival, each job evaluates the cost to run locally or to migrate and execute on another 
processor. II a job migrates, it has to pay for the communication bandwidth required. Their 



experiments show that the algorithm is effective in allocating processors and communication 
resources. 

A market-based approach was proposed by Cheriton and Hartv [2] for system memory alloca- 
tion. In their system, the memory manager deposits money in a process account, proportional 
to the share of the resources that process has to receive. Unlike a real market, the resource 
prices are assumed to be fixed. When it has enough money in its account, the process “leases” 
,1 10 required amount of memory lor a bounded interval of time. At the application level, this 
approach proved to be effective in controlling the amount and the interval of time for which the 
memory is allocated, on uniprocessor and shared-memory multiprocessor systems. 

The remainder of the paper is organized as follows. In the next section we present the model 
in detail. In Section 3, we prove that the starvation is not possible in our model. Section 4 
describes the simulation results. Finally, in Section 5 we summarize our results and indicate 
some future directions for extending oui wotk. 


2 The Model 

We consider a parallel computer consist ing of V identical processors interconnected by a general 
communication network. We assume that the communication parameters for any pair of pro- 
cessors do not depend on their relative position, 1 and therefore the system may be arbitrarily 
part itioned. Every job specifies, upon its arrival, the number of processors p it needs, and the es- 
timated computat ion time. Once processors are allocated, they are guaranteed to be exclusively 
used by the job for the entire duration of its execution. Also, the job is assumed to acquire or 
release all p processors at. the same time. 

T|, e computat ion system is modeled as a microeconomic environment in which different users 
compete for obtaining system r( sources in order to run their jobs. To get t he requested resources 
the user has to pay the price asked by the system. As in real life, the buyers (users) and the 
sellers (system) have antagonistic goals; the users wish to run their jobs as last as possible with 
minimum expenses, while the system wants to maximize its income. 

The How of currency in the system is depicted in Figure 1. Every user has a savings account 
in which he receives money at a constant rate, as long as he has less than a specified amount 
of funds. Whenever a user decides to run a job, he creates an expense account lor that job 
to which money from his savings account is transferred. The job uses this account to buy the 
resources it needs. Once the job is scheduled for execution, all of its money (and depending on 
t he strategy, possibly all the money it receives until it terminates) is transferred to the system 
account. In order to maximize the system income, the scheduler applies a simple st rategy: it 
allocates available resources to the job that offers the best price. In a loaded system, it. is possible 
t hat not all p processors that were requested by a job become available at the same time. In 
this case, when the job is scheduled it is asked to pay for the wasted resources also. In this way 
[•('source fragmentation is discouraged. 

For convenience, t hroughout this paper we refer to the monetary-unit as a dollar and to the 
time-unit as a nunult. The notations used in this paper are summarized in 'lable 1. 

i' 1 'his i s a reasonable assumption for many modern multiprocessor architectures (e.g., IBM SB- 1/2. Intel 
Pa ragon). 

*> 



income 
(constant rate) 



Figure J : The currency flow. 


M; 

I ho maximum amount of funds user i ran have in his savings account. 

Ki 

I he rate at which user i receives income, when he has less than 
Mi dollars in his savings account. 

m,( t ) 

The amount of money user /’ has in his savings account at time /. 

R 

I he income rate over all users in the svstem: R — ^ 1 R;. 

Jik 

The k th job started by user /. 

'l'ik(t) 

I he rate at which user i transfers money into job ,7,/A s 
expense account at time t. 


rite amount of money jot) has in its expense account at time /. 


I ho number ol processors requested by 

t ik 

1 he estimated service time required by job J ^ when 
processors are used. 

l- l k 

I he estimated cumulative computation time for i.e.. = A'/cY;*.. 


Table 1 : The notations used in this paper. 





2.1 The User Savings Account 

Kvery user lias a savings account in which lie accumulates funds for buying resources required 
|>v his jobs. The maximum amount of money the user i can deposit in his savings account is 
bounded by M,. While the user has less than M, dollars, he receives money at a constant rate 
R,. Intuitively, this can be visualized as a system in which every user has a tank with capacity 
\l where he saves his earnings for future consumption. While the tank is not full, the inlet- valve 
is open and tin- tank is filled at a constant rate «,•; once the tank is full, the in let- valve is closed. 

Limiting the maximum funds in a user’s savings account is necessary to avoid disruption in 
system utilization. Suppose a user does not use the computer for a long period of time (e.g.. 
during his holiday). Without this limitation it is possible lor the user to acquire enough money to 
monopolize the system for an appreciable interval of time (e.g., several hours) when he ret urns, 
which will preclude other users from running their jobs. 

During a time interval A/, user t receives at most R,At dollars and spends at most M, + /?, A/ 
dollars (provided he has M, dollars at. the beginning of the time interval, and spends all his 
savings and earnings during the interval At). Notice that for a sufficiently large interval of time 
(A/ -> oc) the' amount //, A/ is the dominant term in the money spent by user ?. Thus, over large 
intervals of time. It, dictate's how much money the user i can spend on t he average lor acquiring 
system resources. 

Next, notice that R; is directly related to the share of the system resources t hat user t receives; 
t he higher It, is. the more resources the user can buy. Moreover, if two users compete lor resources 
at the same time, they will get a shall' that is roughly proportional to their expenditures (since 
the resource prices will be the same). If bot h users spend at the same rate as they receive money, 
then the ratio of their share of the resources would be roughly proportional to their income 
rates. This discussion of “fairness” assumes that users compete at the same time. Otherwise, it 
is possible for a user with less money to buy more resources than anot her user with more money. 
Consider the user who runs his jobs at night, when the system is lightly loath'd and resource 
prices are low, rather than (luring the day when the system is heavily loaded. 

Alt hough t he income rate It, determines the maximum spending rate over large intervals of 
ti„,e. a user with a lower income rate should be able to execute urgent tasks when needed. This 
is possible in our model since for short intervals of time a user can spend much more t han his 
income Specifically let m, (m, < A/;) be the amount of funds user i has in his savings account 
at the beginning of the time interval dt . Then, the user can spend m, + R,dt dollars during the 
interval dt. and therefore the average spending rate {mjdt) + It, could be much higher than //,. 


2.2 The Job Expense Account 

When a user wants to run a job he has to specify its estimated running time and the number of 
processors needed. At the same time, for every job he wishes to run, the user creates an cjrpcnst 
account to which he begins to transfer funds from his savings account. In cont rast with t he user s 
income rate which is constant, the rate at which money is transferred into the savings account 
of a job is variable, and is specified by the user. These funds are used to buy the resources 
required by the job. In this way. the user lias the flexibility to adjust his expenses according to 
the number and relative importance of his jobs. This is similar to the real-lile situation in whi< h 
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people receive a fixed salary per mont h but have the freedom to spend their money according to 
their needs. 

When a user submits a job to be executed, an expense account is created for it, and the job is 
insetted into a list called the rrady-ltst. Whenever a set ol processors becomes idle, the scheduler 
scans the nady-list and selects t he job that offers the best price (see the next section for details). 
11 there are enough idle processors available, then t he selected job could be executed immediately. 
I wo approaches are possible for the manner in which the scheduler computes t he funds that a 
job could afford to spend for acquiring the resources at a time / (denoted by >n' lk (t)). In one, 
it considers only the current funds in the expense account of the job: in the other, it considers 
the future earnings of t he job also. More specifically, let J ik be a job in the nady-list. belonging 
to the user /. that at time / has dollars in its expense account and receives money at a 

rate r ik {t). Then, in the first approach, the scheduler evaluates d, k to have m tk {l) dollars. In the 
second approach, the scheduler finds that the job can spend 

"4(0 = n» ik (t) + [' r tk {t')dt'. (1) 

w here lj is ,J ik s estimated finishing time (// is the current time / plus the estimated waiting 
time plus the estimated running time). Equation ( 1 ) has only a theoretical importance, since in 
piactke it is haul to estimate how 7\ k will vary in the lut lire. This depends both on the user's 
strategy and t he current set of jobs he has to run. A guaranteed lower bound r' ik on the rate at 
which .J lk will receive money in its expense account could be used to obtain a simple estimate 
of the fut. uie income. 1 hen, at time /, the scheduler can assume that J ik can spend at least 
"4(0 = '"iOO + OOO ~ 0 dollars. Notice that if r' ik = 0 (user / does not guarantee any future 
money transfer for the job), then this reduces to the first approach since w' tk (t ) = m ik (t). 

For simplicity, the transfer of money between a user's savings account and the job expense 
account is unidirectional in that money cannot be transferred back into the savings account 
For example, if a job buys some resources lor a certain interval of time but finishes earlier than 
predicted, then the balance cannot be returned back to the user. On the other hand, if a job 
fails to finish at the predicted time, then it will be allowed to continue for some tilin' while 
being charged for the additional time, as long as the user can afford to pay; otherwise it will be 

teiminated. This simple solution motivates the user to provide accurate estimates for the job 
service time. 

2.3 The Price of Computation 

We have tonsideied two strategies for establishing the price of computation. 

1 he first approach is similar to the one used in other microeconomic systems [1, 19], In this 
approach, time is assumed to be divided into intervals called time-diets. At the beginning of 
every time-slice the scheduler computes the prices offered by all the jobs in the nady-list. If 
the job that uses the resource has not finished yet, then it. is allowed to execute for the current 
time-slice il it. can continue to pay at the current price; otherwise the job that has offered the 
highest price is scheduled to run for the current time-slice. Since the price is evaluated at. every 
time-slice, this scheme accurately reflects the market trends in prices (e.g., when competition 
increases, the price also tends to increase). Unfortunately, this approach has several drawbacks. 
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Figure 2: The execution time diagram for two processors. The shaded area next to a processor 
indicates that the processor is free while the white area indicates that it is busy. At time 0 
(Figure (a) processor 1 is free while processor 2 is busy lor 4 minutes. Also, at time 0 there aie 
two jobs A and B in the ready-list; A needs one processor for 5 minutes, while B requires two 
processors for 4 minutes. Figures (b) and (c) show two possible schedules. 

First . evaluating the highest offer at every time-slice incurs a. high overhead. Second, a job would 
not know at the beginning how much it has to pay to complete execution, and could run out of 

nionev before termination due to unexpected price changes. 

The second strategy is to negotiate a price that is constant for the entire period of the compu- 
tation. The main disadvantage of this strategy is that for large intervals of time the price may 
no longer reflect the level of competit ion for the resources. For example, if a user st arts several 
jobs earlv in the morning before other users submit their jobs, he can get all t he resources at zeio 
cost since there is no competition. But, if his jobs take several hours to complete, then no other 
user can run jobs during this time. This would compromise our objective to ensure farrntss. 
A common solution to this problem is to gather statistics and predict, the price per minute lor 
future process utilization. Since the prediction is more accurate over large intervals of time, we 
use a weighted function in order to establish the price for the next A/ minutes at time /. More 
precisely. />( /), the price at time /, is 


p(t) = ;>,(*, A<) + (Pa(t) -Pr(B±i))t ( 2 ) 

where /,„(/) represents the current highest offer at time /, /*,(/, A/) is the estimated price for the 
next A/ minutes and o is a positive constant. When A/ - 0 the price />( 0 goes to />„(/), while 
for large values of A / (A/ -»• oc) the price p(t) tends to the estimated value />,(/. A/)- Thus, 
every job that is scheduled to start at time I and run for the next A/ minutes is asked to pay at 
least p(l) dollars/minute. 

We chose the second approach for two reasons: first, the completion time for computation 
bounded jobs can be predicted with good accuracy; and second, the algorithm is simpler and 
more efficient to implement. 

When the scheduler scans t he ready-list , it computes the price per minute offered by every job 
J k . as a funct ion /' of: the predicted service time 7/*, the number of requested processors A;*, 
and the estimated expenses m'Jt). We describe the details in the next two paragraphs 

First - consider a job J ik that needs only one processor (N ik = 1). In <his case, the price 
offered bv J ik is computed as /( 1 . 7’;*. m',(/ )) = Next, consider a job J ik that, requires A* 

processors, where 1 < S, k < n. If at least N ik processors become free at the same time then 

t he price offered by J ik is computed as /( N ik . T ik . >»' a .(l )) = " hen t,ie fi,st 


K 


- J ik 


(i 



processors to become free finish at different t imes (as is more probable), deciding what job to 
run next in order to maximize the system income is difficult. To see why, consider the example 
shown in Figure 2(a). I lie system consists ol t wo processors such that when t he first processor 
becomes free, the second one requires I minutes to process its current task. Now. assume that 
there are two jobs: A requires one processor for 5 minutes and offers 3 dollars/minute and B 
requires two processors for a total of 8 minutes (1 minutes on each processor) and offers to pay 
4 dollars/minute. What job must be scheduled first, in order to maximize the system income ? 
The second job offers a higher price per minute but cannot start as long as the second processor 
is busy, while although the first job offers a lower price, it can start immediately. The following 
examples show that there is no unique answer. If the next job to be executed requests two 
processors, then clearly, scheduling A first (Figure 2(b)) is better since both processors are free 
after 9 minutes. On the other hand, if the next job to be executed arrives at / = 1. requires 
exactly 3 minutes, and pays 6 dollars/minute, then it can be immediately scheduled on processor 
1, and therefore scheduling B first maximizes the system income (Figure 2(c)). 

Our solution to this problem is the following. In computing the price for a. job. the sched- 
uler takes into account not only the effective cumulative computation time (/’,’,*.)> but also the 
computation time that is wasted while waiting for other processors (requested by the job) to be 
available. In the example, when B is scheduled it wastes four minutes of processor 1 unless then* 
is another job in the roidy-hsl that can fit in the space. Consequently, the scheduler asks the 
job to pay also lor the potentially wasted four minutes and B is estimated to require 12 minutes 
( = 1 minutes x 2 processors + 1 wasted minutes). Hence, the roil price per nriiruU offered by 
B is scaled proportionally, i.e., 4 • § = 2.06... . With this modification, the scheduling algorithm 
will continue to select the job that offers the highest, real price per minute (in t his example, .4). 

4 Inis, in this case we compute the price offered by as being 


./'(•V,/:. T lk . m' tk (t)) = 


» 4(0 _ " 4(0 

(1 /A- + 0/a "/’,a- If ik + E ik 


(••*) 


where W ik is the wasted computation time in scheduling J ik to run on the first N ik processors 
that become available. Notice that asking parallel jobs to pay for potentially wasted resources 
discourages fragmentation in processor allocation. 


2.4 The User Strategy 

Generally, the user can implement any mechanism for allocating funds to his jobs in our model. 
Unfortunately, t his freedom makes it very hard to analyze and even simulate such a. model. Hence 
we propose a simple strategy that we consider to be flexible enough for pract ical use. As in other 
scheduling policies, the idea is to group jobs into different classes. But. while in other policies 
this classification is done at the central level from the system point of view (e.g., based on the 
resource requirements), in our case the classification is done at the user level. For example, the 
user can classify his jobs based on their urgency, their resource requirements, etc. 

Let ( a . ( i2 , . . . ('i s be .s classes to which the jobs of user i may belong. We associate a coefficient 
°'7 witJl pac, ‘ i ol> class < 4 , chosen such that £? =1 o-y = I. Recall that E ik is the cumulative 



computation time requested by a job J ik and let E, be weighted sum over all the estimated 
cumulative computation times of all jobs of user i that are in the rcady-hsi. Heine we have 

e (4) 

i=[ 

Then the transfer rate to the expense account of J lk is given by the following formula: 


Notice that if user i has at least, one job. then the sum of the transfer rates into the expense 

accounts of his jobs is equal to his income rate R , . 

In this strategy the classification reflects the importance of the jobs; the higher the coefficient 
n -j , the higher the price increase a job belonging to the class C u can afford to pay. 

This strategy can be further refined by allowing the coefficients to be dynamically changed in 
order to achieve certain objective functions (see Section 4 for details). 

2.5 Implementation Issues 

The overhead introduced by the scheduler is as important as the scheduler performance itself. 
Therefore, in this section we briefly describe some implementation issues. 

The information in the ready-list is modified in one of the following cases: a new job arrives 
in the list, a job terminates and its processors become available, and the rate r ik (at which a job 
receives money from its user) changes. Since the first case is trivial (the job is appended to the 
v( a (I ij-hsf) , \v<‘ discuss only the other two. 

When a subset of processors becomes free and there is no other job t hat is scheduled to be 
executed, the job in t he nady-li.R that offers the highest price is scheduled. It there are enough 
available processors, the job is executed immediately; otherwise it has to wait until enough 
processors become free. If a job is already scheduled for execution, then the scheduler checks 
whether then' are enough free processors for that job. If so, the job is loaded, and the scheduler 
scans the nmly-Ust to schedule a new job. The complexity of finding the next job to schedule 
is linear in the number of jobs in the irady-list, since the list is scanned only once to schedule a 

job. _ , 

When i'ik(l). the rate at which user i transfers money into J ik s expense account, changes, 
the amount of money in the account, in ,-*(/). is updated. For simplicity of exposition, assume 
that r,k(l) is constant between two subsequent changes (the case when r lk (1) is an ai bit 1 a i \ 
function can be treated similarly). Suppose that at t = /,. the rate r d (t) was changed and the 
expense account was updated accordingly. Then, when r ik {t) is changed again at I = h- we have 
m ik (t;) - n>ik(l i) + I'ikiE )(f -2 ~ h)- Thus the scheduler can compute m ik (l) at any future t ime 
I > I , before the rate changes again. Recall from the previous section that in our scheme the 
rate r tk {l) is changed only when a new job arrives in the nady-list , or a job finishes execution. 
Then the rates are changed for all jobs belonging to user i. In the worst case all the jobs in the 
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ready-list belong to user /, and lienee the eoniplexil y of the updates caused by a change in these 
transfer rates is linear in the number of jobs in the list. 

11 the ready-list is too large for an algorithm that is linear in the number of jobs to he satis- 
factory, then a variant ol the algorithm in which only the first. 77 jobs from tin' list are considered 
lor scheduling can bo implemented, where 77 is a parameter that can be specified. 

3 Non- Starvation 

Every scheduling algorithm has to address the fundamental problem of starvation, the situation 
where a job waits indefinitely to acquire t he resources it needs to run. In this section we prove 
that starvation is not possible in our model. Wo make two assumpt ions: first, t he running time of 
every job is bounded above by T mar , and second, there exists a lower bound r min on the transfer 
rate from the user savings account to every job's expense account. Also, wo assume that the 
number of users in is bounded. 2 

Let us consider a job .) that requires p processors for an estimated service time T on a parallel 
computer with A identical processors. For convenience denote the time at which J enters the 
ready-list by / = 0. After A/ minutes, J has in its expense account at least A/ dollars. In 
order to be scheduled, a job has to offer the highest price per minute during the sum of the 
required computation time;//' and the time the job spends in waiting for /> processors to become 
free. 1 he largest amount ol money that job ./ has to pay is when there are p — 1 free processors 
and the remaining A — }> + 1 processors finish after exactly T majr minutes. Therefore, to be 
scheduled the job J has to pay for at most pi + (p — 1 )T maT minutes. Let A/j be the time 
interval at which the following equality is true: 

min A/, _ R 

PT + (/>“! )T,na.r ~ V — p + 1 + ' ^ 

where H represents the sum over all the income rates received by all users and t) is an arbitrary 
positive constant. In words. Equation (6) says that after A/i minutes the job J can pav at least 
^ dollars/ minute. Next, let t\ be the time at which Equation (6) becomes true, (dearly, 
at some time between / = t ] and / = t\+T mar all the jobs that were running at *, will finish and 
theiefore other jobs will be scheduled to run. If J is not scheduled in this interval then there are 
at least A — p + 1 processors that receive more dollars/minute than what J could offer. 

Let M = M-i denote the total funds that all users have in their savings accounts at time 
/j. Then between f\ and some future time t> all the other jobs, excepting can spend at 
most M + (H — r m j n )(f- 2 — /j) dollars for acquiring the resources. As observed before, if J is 
not scheduled by i ] + 7 r ma , r , then there are other jobs that have paid a higher price for at least 
A — /> + 1 ]>rocessors. Let A/ 2 be the time interval such that 


M + (Ji_ - r m ;„)A / 2 = M K-r mi n 

( A “7^+1 )-A/ 2 ( A — p -f 1 ) A / 2 A r — p + 1 * 

“ Ihis is a realistic assumption since the total number of active users is bounded by the total number of users 
who have accounts on that computer. 



N -Ml 

Since the first term on the left-hand side lnonotonically decreases with the interval A / 2 , this 
interval represents the maximum interval of time for which the (JV - p + 1) processors can he 
paid at a rate greater than -y^+7 + ^ dollars/minute. Hence the job J will he scheduled by 
/ = A/, + -f- A/- 2 - since then it can pay more than v _^ +1 + dollars/minute (Irom Equation 

( 6 )). 

4 Experimental Results 

We have implemented a simple simulator in which we consider a parallel computer with A = 128 
ident ical processors and 10 independent users, to validate our model. We assume that jobs of 
any user belong to only one of three classes (see Table 2). The jobs are assumed to come from a 
single Poisson source wit h mean arrival rate A (measured in jobs/minute). By t he decomposition 
property of a single Poisson process into m output streams ([18], Sec. 6.4), we can divide the 
initial job stream into ten independent streams, and therefore every user i can be modeled as 
an independent Poisson source from which jobs arrive with a mean late />, A (when p, is the 
probability that a job comes from user /). Further, we denote by q,\- <hi and (j,:i the probability 
that a job that comes from user i belongs to class 1 . 2 and respectively. 1 bus. the mean attival 
rate of a job from user i belonging to class j is q, J p l A. 

The job service time is assumed to have a biphase hyperexponential distribution [13]. The 
relative values for the average service time and coefficient of variation for each class (see I able 2) 
are derived from the observed workload on an Intel iPSC/860 hypercube at NASA Ames, reported 
bv Feilelson and Nitzberg [6], '* 

In the following discussion, for ease of notation, we number all the jobs in the system during 
the simulation from to J n . Let T, represent the execution time of ./,, using the number 
of processors requested by t he job. Let represent the system response time lor job J,, the 
difference between the time when the job completes execution and the time when the job is 
submitted by the user. Thus .*«• = T, + where w, is the time the job J, waits before it is 
executed. Demote the ratio between the system response time and the service time for job ./, by 
m = s-JT,. Observe' that u, is greater than or equal to one. 

Following Naik. Sofia and Squillante [16], we use two performance metrics in analyzing the 
model: the- mum system response time S, and the mean ratio of a job's system response time to 
its service' time l (we call this mean user response for short): 

S= lim -Y'.s,; V = lim (8) 

«— x. v n—-K. a 

-’Since' we' consider a more- general architecture than an iPS( 78(50 fiyperciiho. we' assume that the number 
of processors that a job requests is uniformly distributed. For example, a job that take's (i t processors on a 
hypercube is assumed to request any number of processors between 32 and (id. with equal probability. Also, wo 
have omitted the verv large jobs that request, all 128 processors in Feilelson and Nitzberg** data, since these jobs 
are run at night, when the load is light. Finally, we have not used the absolute values lor service-times as given 
in [(>]: instead we have chosen values that approximate the rat ios between the service-times of different classes. 



Class 

typ< 

Number of 
processors 

S( rvic( 
time 

('oefficient of 
variation 

f lij 

I 

1-16 

50 

i 

0.7 

2 

16-32 

100 

2.5 

0.2 

3 

32-6-1 

200 

l.S 

0.1 


Tabic* 2: The workload characteristics. 


Note that S measures the performance 1 from the system \s point of view, while V measures the 
performance from the user's point of view [Hi]. 

Let E be the mean oi the cumulative computation time over all jobs submitted to the system. 
I hen. we define the syslfm load p as the fraction between the total demand received by the 
system in one time unit (\E), and the available computation time per time unit (AN since then 1 
are N processors); i.e., p — A/s/A . 

In the first experiment we compare the microeconomic scheduling policy {ECON) with two 
different variable-partitioning (VP) policies ([5], Sec. 3.2.:}). A VP policy allocates to each job 
the exact number of processors it requests; the processors are not partitioned into predetermined 
subsets. I he t wo policies we consider are the following: 

• F(TS This is thr simplest policy. T he jobs arc 1 placed in a first-come first -served (FCFS) 
queue; if there are enough free' processors then the 1 first job from the queue is scheduled for 
execution. If not, the job waits till the requested number of processors becomes free. 

• RES In this case, if a sufficient number of processors are not available 1 to run the next job 
from the queue, the 1 scheduler reserves processors for this job for the 1 earliest time 1 in the 
future when the required number of processors are available. Furt her, to make use of the 
idle processors until that time, the scheduler searches the queue and schedules tlx 1 earliest 
jobs whose requests can be satisfic'd before these. 1 processors need to be dedicated to the job 
with the 1 reservation. 

I he f ( I S policy is expected to perform the worst among these policies, since it tends to heavily 
penalize small jobs when the system load is high. For, suppose the first job in the 1 queue asks 
for a large number of processors and its request, cannot be satisfied. Then, subsequent jobs have 
to wait , even if there are enough free processors in the system to satisfy their needs. The RES 
policy eliminates this problem; if a large job cannot run immediately, the scheduler searches for 
subsequent jobs whose 1 requests can be satisfied. Notice that the RES policy is a special case of 
the E( OS policy in which the income 1 rate of every user is zero (if wo assume 1 that the scheduler 
selects the 1 job that arrives first among jobs that offer the same price). 

In the EC OS policy we assume that that every user has the same income rate equal to 100 
dollars/minute. Wo also assume that a user distributes his income equally bet ween jobs from 
different classes, and thus the coefficient associated with each class is equal to 1/3. In each of 
the following experiments, we generate a system load p between 0.1 and 0.0. by suitably varying 







A. To attain steady state we run each experiment (for every value of p) for 500.000 time-units. 1 

Figure 3(a) shows the mean system response time. S, for all three policies for values of p 
between 0.1 and 0.9. When p < 0.3. all the policies offer almost the same performance. In 
this regime, there are few jobs in the system and there are enough processors to satisfy all the 
incoming requests. Next, for p > 0.3 the mean response time for the FCFS policy begins to 
increase sharply. This is because the large jobs monopolize the resources at the expense ol 
small jobs. Finally, when p exceeds 0.1, t he ECON begins to outperform the RES policy. The 
improvement in S obtained with ECON over RES is significant: When p = 0.9. S decreases by 
more than 34'/. Figures 3(b), 3(c) and 3(d) compare the system response times for each class 
of jobs. As expected, the biggest gain is for small and medium jobs (classes 1 and 2). This is 
because the ECON policy asks a job to pay not only for the computation time it needs, but 
also for the wasted time. This favors smaller jobs, since we expect that the larger the number 
of processors a job requests, the greater is the wasted time for which the job has to pay. Next. 
Figures 1(a). 1(b). 4(c) and 4(d) show the mean user response (I ) for the three policies: first, 

for all jobs combined, and next, for each class of jobs. The behavior ol the mean usei i espouse as 
a function of the arrival rate, and as a function of the job class, is quite similar to t he behavior 
of the' system response time: the advantage of the E( ().\ policy relative' to the otliei polities is 
even greater. 

In the next experiment we study how the user income rate influences the user performances, 
f or this experiment we consider three different income rates for t he first user, ■)(). 100 and 200 dol- 
lars/minute. while the income rates for all other users remain unchanged at 100 dollars/minute. 
Let Tr(/f,) denote the mean user waiting time for user / when his income rate is R,. Figure 5 
shows that the waiting time for the first user is inversely proportional to his income rate, when 
the mean job arrival rate is sufficiently large. For instance, when p - 0.9 and R , = 50 dol- 
lars/minute, the mean user waiting time is lh6% ol the value when J\\ — 100 dollais/minutr . 
while for /?, = 200 dollars/minute it is 55% of this value. To see why this happens, consider t he 
case when R\ = 200 dollars/minute. Since the first user receives twice as much income as the 
others, he can transfer money to his jobs roughly twice as last. Therefore the price per minute 
offered by his jobs increases proportionally faster, and consequently the mean waiting time of 
these jobs reduces by approximately a half. 

In the last experiment we evaluate an adaptive strategy that controls the relative user response 
for each class. More specifically, let Fi, F 2 . be the mean user responses for jobs in class 1, 
class 2 and class 3. respectively. Our goal is to enforce certain ratios between the mean user 
responses for each class of jobs. i.e. l'i : l i : l A = «i '■ « 2 '■ where a j. a 2 and 03 are 
predefined constants. In other words, we would like each class to satisfy 

== Lj ==r- = — : for 1 < / < 3. 

( 1 + / : 2 + l A « 1 + a '2 + a -i 

To achieve this objective, the user periodically adjusts the coefficients associated with every class 

'In tlir current implementation we have not changed the price of computation over time as described 111 
Kcpialion (2); since we consider only constant workloads (p is fixed), we assume that the price is also constant in 
(lit 1 steady state. 
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System Load 

Figure 5: The mean waiting t ime for user i for t hree income rates 50 , 100 and 200 dollars/minute. 
All other users have an income rate of 100 (lollars/minute. 



figure 0: The measured mean user responses lor each class: ( \*l r 2 * (3. The desired ratios are: 
(a) r, : V 2 : F 3 = 1:2: 2, and (l>) V x : V 2 : V* = 1 : 1 : 2. 




(sec Section 2.1) according to the following equations: 


a; = of " 1 


— A — 1 


v!~ l +rt l +i 7 t l 


(l 1 + a 2 + (t 't 
«* 


whrrr r- re])resents the mean user response lor jobs belonging to class / at the k th iteration. 

Obviously, wo have T 7 , = limjt i'i- Notice t hat whenever C- is larger than expected, i.e., 

r*7(rf' + rf + r.j ) > «,-/(«, + a 2 + <■/:*)- then of increases and therefore the jobs in class i will 
receive a larger share of the user income. Conversely, il I , is smaller than expected, then the 
user decreases the share of the income allocated to jobs in class i. We update the coefficients 
every 2. 000 time-units in our experiment. 

Figure 0(a) shows the mean user response lor each class ol jobs when a \ = 1 and a 2 — «\\ = 2. 
Again, when the system load is low there is not much the algorithm can do, since there are 
few jobs in the svstem and the resources are plentiful. On the other hand, adaptive control 
becomes increasingly efficient when the system load increases. For example, when p = 0.0. the 
measured mean user response ratios are V j : l ' 2 : V;i = 1 : 1-97 : 2.00. which is close to the 
prescribed ratios 1:2: 2. Finally. Figure 0(b) show the mean user responses lor a different 
set of ratios: «, = « 2 = 1 and « 3 = 2. In this case, when p = 0.9, the measured ratios 
1 ; 1.01 : 2.08 are again close to the prescribed ratios. 


5 Conclusions and Future Work 

We have applied the microeconomic paradigm to schedule computation- bounded jobs on parallel 
systems. Our simulation results show that the microeconomic scheduler compares favorably with 
other variable part itioning policies both in terms ol system and user performances. Additionally, 
the scheduler guarantees an adequate level of fairness in allocating resources among the users. 
Finally, by using a simple adapt ive mechanism that adjusts the rate at which money is t ransferred 
from t he user savings account to a job expense account, the scheduler controls the relative job 
performances. 

Many open problems remain. 

We are currently extending the model to schedule jobs that specify a minimum and a maximum 
number of processors, and which can be allocated a number ol processors within this interval at 
load-time. (We intend also to consider jobs that can dynamically change the number of processors 
da riny execution). The idea is to study the trade-off between the number ol processors a job 
requests and the price it has to pay. Not ice that if a job d lk reduces the number of processors 
it requests, then the price it pays decreases for two reasons: First, the wasted time a job pays 
for decreases with fewer processors. Second, the cumulative computation time (E ik ) decreases if 
the job's speedup is sub-linear. Moreover, when requesting fewer processors, the waiting time is 
also likelv to decrease. Therefore it, would be possible for a job to obtain a better response time 
using fewer processors and paying less (if the decrease in the waiting time ofisels the increase in 
t he service t ime l) k )l 

A second area for future work is to extend the model to other system resources such as memory 
and I/O bandwidth. One difficulty here is correlating the allocation of the various resources. 
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For example, when a job buys computation time it has also to buy enough memory; otherwise 
instead of computing, it has to wait for the memory pages to be swapped in and out. 

Third, it will he interest ing to explore other policies for transferring funds from a user's savings 
account to a job expense account. It might be worth considering variable user income rates. The 
idea would be to allocate a share ol the system resources to every user and then to dynamically 
adjust the income rate in order to ensure that every user receives his share. Here, the trade-off 
is between increasing algorithm overhead and increasing accuracy of control. 

We believe that, the microeconomic paradigm may serve' as a unifying theme for multiprocessor 
scheduling. We have 5 seen that the variable partitioning scheme 5 wit h job reservations is a special 
case of the microeconomic scheduler (when the income rates are 5 zero). We 5 hope 5 to show in future 
work that other scheduling policies might also be 5 obtained by suitably choosing the 5 parameters 
in the 5 microeconomic paradigm. 
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